Travel Package Purchase Prediction - Case Study

Background and Context

Problem Statement

Objective

Data Dictionary

This dataset contains the information of the 'Visit with us' customer data.

Customer details:

Customer interaction data:

Loading Libraries

Load Dataset

View the first and last 5 rows of the dataset

Understand the shape of the dataset

Observations:

Let us check for null values and duplicates

Observations:

Let us check for duplicates

Observations:

Check the data types of the columns for the dataset

Observations:

Dropping column which is not adding any value and is of no statistical importance.

Observations:

Summary of the dataset

Observations:

Lets us look at different columns for Unique Data

Observations:

Lets check if there is a pattern in missing data

Observations:

Lets us look at different columns for Missing Data

Observations:

Missing Value Treatment

Lets treat the Age column missing values

Observations:

Lets treat the MonthlyIncome column missing values

Observations:

Lets treat missing values of TypeofContact, DurationOfPitch, NumberOfFollowups, PreferredPropertyStar, NumberOfTrips, NumberOfChildrenVisting columns

Observations:

Datatype Conversions

Observations:

Data Preprocessing - Feature Engineering

Lets treat the Gender column 'Fe Male' and group it under female category

Observations:

Lets group the MaritalStatus column into Married and Unmarried category and reduce the categories from 4 to 2.

Observations:

Let us bin the Age, MonthlyIncome, NumberOfTrips and DurationOfPitch into ranges for better EDA plotting and to minimize category buckets

Univariate Analysis

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Bivariate Analysis

Observations:

Observations:

Lets plot the stacked bar for ProdTaken vs all the other variables to check for the correlation

Observations:

Customer Profiling

Lets check which group of customers prefers what package

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Observations:

Outliers Detection using boxplot

Observations:

Let's treat using capping method and check again.

Outlier Treatment

Lets check the Outlier Treatment

Observations:

Model Building - Approach

  1. Data preparation
  2. Partition the data into train and test set.
  3. Build model on the train data.
  4. Tune the model if required.
  5. Test the data on test set.

Lets copy our dataset to a new one removing all unwanted columns for model building.

Split Data

Model evaluation criterion

Model can make wrong predictions as:

Which case is more important?
How to reduce this loss i.e need to reduce False Negatives?

Let's define a function to provide metric scores(accuracy,recall and precision) on train and test set and a function to show confusion matrix so that we do not have use the same code repetitively while evaluating models.

Build Decision Tree Model

Confusion Matrix -

Observation:

Observations:

Bagging Classifier

Observation:

Bagging Classifier with weighted decision tree

Observation:

Random Forest

Observation:

Observations:

Random forest with class weights

Observation:

Observations:

Tuning Models

Using GridSearch for Hyperparameter tuning model

Tuning Decision Tree

Observation:

Observations:

Tuning Bagging Classifier

Observation:

Tuning Random Forest

Observation:

Feature importance of Random Forest

Observation:

Boosting Models

AdaBoost Classifier

Observation:

Observations:

Gradient Boosting Classifier

Observations:

Observations:

XGBoost Classifier

Observations:

Observations:

Hyperparameter Tuning

AdaBoost Classifier

Observations:

Observations:

Gradient Boosting Classifier

Let's try using AdaBoost classifier as the estimator for initial predictions

Gradient Boost as compared to the model with default parameters:

Lets also try with init = Adaboostclassifier and check for any differences in the performance

Observations:

Observations:

XGBoost Classifier

XGBoost has many hyper parameters which can be tuned to increase the model performance.
Some of the important parameters are:

Observations:

Observations:

Stacking Model

Observations:

Comparing all models

Observation:

Comparing Bagging, Decision Tree and Random Forest models

Observation:

Comparing Boosting and Stacking models

Observations: